Line Detection and Segmentation in Historical Church Registers

نویسندگان

  • Markus Feldbach
  • Klaus D. Tönnies
چکیده

For being able to automatically acquire the information recorded in church registers and other historical scriptures, the writing on these documents has to be recognized. This paper describes algorithms for transforming the paper documents into a representation of text apt to be used as input for an automatic text recognizer. The automatic recognition of old handwritten scriptures is difficult for two main reasons. Lines of text in general are not straight and ascenders and descenders of adjacent lines interfere. The algorithms described in this paper provide ways to reconstruct the path of the lines of text using an approach of gradually constructing line segments until an unique line of text is formed. In addition, the single lines are segmented and an output in form of a raster image is provided. The method was applied to church registers. They were written between the 17th and 19th century. Line segmentation was found to be successful in 97% of all samples.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Robust Line Detection in Historical Church Registers

For being able to automatically acquire information recorded in church registers and other historical scriptures, the text of such documents needs to be segmented prior to automatic reading. Segmentation of old handwritten scriptures is difficult for two main reasons. Lines of text in general are not straight and ascenders and descenders of adjacent lines interfere. The algorithms described in ...

متن کامل

Segmentation of the Date in Entries of Historical Church Registers

Handwriting recognition requires a prior segmentation of text lines which is a challenging task, especially for historical scripts. Exemplary for the date in entries of historical church registers, we present an approach which enables a segmentation by using additional knowledge about the word sequence. The algorithm is based on probability distribution curves and a neural network, which assess...

متن کامل

Word Segmentation of Handwritten Dates in Historical Documents by Combining Semantic A-Priori-Knowledge with Local Features

The recognition of script in historical documents requires suitable techniques in order to identify single words. Segmentation of lines and words is a challenging task because lines are not straight and words may intersect within and between lines. For correct word segmentation, the conventional analysis of distances between text objects needs to be supplemented by a second component predicting...

متن کامل

DyVSoR: dynamic malware detection based on extracting patterns from value sets of registers

To control the exponential growth of malware files, security analysts pursue dynamic approaches that automatically identify and analyze malicious software samples. Obfuscation and polymorphism employed by malwares make it difficult for signature-based systems to detect sophisticated malware files. The dynamic analysis or run-time behavior provides a better technique to identify the threat. In t...

متن کامل

Radial Line Fourier Descriptor for Segmentation-free Handwritten Word Spotting

Automatic recognition of historical handwritten manuscripts is a daunting task due to paper degradation over time. Recognition-free retrieval or word spotting is popularly used for information retrieval and digitization of the historical handwritten documents. However, the performance of word spotting algorithms depends heavily on feature detection and representation methods. Although there exi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001